Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications
نویسندگان
چکیده
The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques TS, anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. models, although simpler less resource-intensive, are key assessing reviews feedback on products or services. Nonetheless, current methodologies not fully resolved concerns surrounding complexity, adaptability, computational demands. Thus, we propose our scheme, GETS, utilizing model to forge connections among words sentences through statistical procedures. structure encompasses post-processing stage includes sentence clustering. Employing Apache Spark framework, designed for parallel execution, making it adaptable real-world applications. For evaluation, selected 500 documents from WikiHow Opinosis datasets, categorized them into five classes, applied recall-oriented understudying gisting evaluation (ROUGE) parameters comparison with measures ROUGE-1, 2, L. results include recall scores 0.3942, 0.0952, 0.3436 L, respectively (when using clustered approach). Through juxtaposition existing models such BERTEXT (with 3-gram, 4-gram) MATCHSUM, has demonstrated notable improvements, substantiating its applicability effectiveness scenarios.
منابع مشابه
Assessing sentence scoring techniques for extractive text summarization
0957-4174/$ see front matter 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.04.023 ⇑ Corresponding author. Tel.: +55 8197885665. E-mail addresses: [email protected] (Rafael Ferreira), [email protected] (L. de Souza Cabral), [email protected] (R.D. Lins), [email protected] (G. Pereira e Silva), [email protected] (F. Freitas), [email protected] (G.D.C. Cavalcanti), rjl...
متن کاملBiogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization
Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...
متن کاملExtractive Based Automatic Text Summarization
Automatic text summarization is the process of reducing the text content and retaining the important points of the document. Generally, there are two approaches for automatic text summarization: Extractive and Abstractive. The process of extractive based text summarization can be divided into two phases: pre-processing and processing. In this paper, we discuss some of the extractive based text ...
متن کاملA new sentence similarity measure and sentence based extractive technique for automatic text summarization
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of...
متن کاملTopical Coherence for Graph-based Extractive Summarization
We present an approach for extractive single-document summarization. Our approach is based on a weighted graphical representation of documents obtained by topic modeling. We optimize importance, coherence and non-redundancy simultaneously using ILP. We compare ROUGE scores of our system with state-of-the-art results on scientific articles from PLOS Medicine and on DUC 2002 data. Human judges ev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information
سال: 2023
ISSN: ['2078-2489']
DOI: https://doi.org/10.3390/info14090472